Data cleaning

Describe the dataset attributes

HeartDiseaseorAttack: A binary variable indicating whether the individual has a history of heart disease or heart attack. HighBP: A binary variable indicating whether the individual has high blood pressure. HighChol: A binary variable indicating whether the individual has high cholesterol. CholCheck: A binary variable indicating whether the individual has had their cholesterol checked. BMI: A continuous variable representing the individual's body mass index, which is a measure of body fat based on height and weight. Smoker: A binary variable indicating whether the individual smokes cigarettes or not. Stroke: A binary variable indicating whether the individual has had a stroke. Diabetes: A binary variable indicating whether the individual has diabetes. PhysActivity: A categorical variable indicating the level of physical activity of the individual. Fruits: A continuous variable representing the number of servings of fruits the individual consumes. Veggies: A continuous variable representing the number of servings of vegetables the individual consumes. HvyAlcoholConsump: A binary variable indicating whether the individual has heavy alcohol consumption. AnyHealthcare: A binary variable indicating whether the individual has any healthcare coverage. NoDocbcCost: A binary variable indicating whether the individual does not have any doctor's visits due to cost. GenHlth: A categorical variable indicating the general health status of the individual. MentHlth: A categorical variable indicating the mental health status of the individual. PhysHlth: A continuous variable representing the number of days the individual's physical health was not good. DiffWalk: A binary variable indicating whether the individual has difficulty walking. Sex: A categorical variable indicating the gender of the individual. Age: A continuous variable representing the age of the individual. Education: A categorical variable indicating the highest level of education completed by the individual. Income: A categorical variable indicating the income level of the individual.

Data Exploration & visualization

Model building & Normalization

Histograms: To visualize the distribution of continuous features such as age. Box plots: To visualize the distribution and spread of continuous features and detect outliers. Bar plots: To visualize the frequency distribution of categorical features such as education. Heatmaps: To visualize the correlation matrix between different features. Scatter plots: To observe relationships between variables and uses dots to represent the relationship between them. countplot: To method is used to Show the counts of observations in each categorical bin using bars.